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ABSTRACT 

Differential privacy is a promising privacy-preserving paradigm for 
statistical query processing over sensitive data. It works by inject- 
ing random noise into each query result, such that it is provably 
hard for the adversary to infer the presence or absence of any indi- 
vidual record from the published noisy results. The main objective 
in differentially private query processing is to maximize the accu- 
racy of the query results, while satisfying the privacy guarantees. 
Previous work, notably (T6), has suggested that with an appropri- 
ate strategy, processing a batch of correlated queries as a whole 
achieves considerably higher accuracy than answering them indi- 
vidually. However, to our knowledge there is currently no practical 
solution to find such a strategy for an arbitrary query batch; ex- 
isting methods either return strategies of poor quality (often worse 
than naive methods) or require prohibitively expensive computa- 
tions for even moderately large domains. Motivated by this, we 
propose the Low-Rank Mechanism (LRM), the first practical dif- 
ferentially private technique for answering batch queries with high 
accuracy, based on a low rank approximation of the workload ma- 
trix. We prove that the accuracy provided by LRM is close to the 
theoretical lower bound for any mechanism to answer a batch of 
queries under differential privacy. Extensive experiments using real 
data demonstrate that LRM consistently outperforms state-of-the- 
art query processing solutions under differential privacy, by large 
margins. 

1. INTRODUCTION 

Differential privacy 111 11 is an emerging paradigm for publishing 
statistical information over sensitive data, with strong and rigorous 
guarantees on individuals' privacy. Since its proposal, differential 
privacy has attracted extensive research efforts, such as cryptogra- 
phy ED, algorithms QUEUED, databases ||8] [15] [l6j [24] [27] [28] 
1291 , data mining lfTl ll3l and machine learning t3ll4] !25l . The main 
idea of differential privacy is to inject random noise into aggre- 
gate query results, such that the adversary cannot infer, with high 



confidence, the presence or absence of any given record r in the 
dataset, even if the adversary knows all other records in the dataset 
except for r. This paper follows a popular definition of differen- 
tial privacy, called e-differential privacy, in which the adversary's 
maximum confidence in inferring private information is controlled 
by a user-specified parameter e called the privacy budget. Given e, 
the main goal of query processing under e-differential privacy is to 
maximize the utility/accuracy of the (noisy) query answers, while 
satisfying the above privacy requirements. 

This work focuses on a common class of queries called linear 
counting queries, which is the basic operation in many statistical 
analyses. Similar ideas apply to other types of linear queries, e.g., 
linear sums. Figure Eta) illustrates an example electronic medical 
record database, where each record corresponds to an individual. 
FigureEtb) shows the exact number of HIV+ patients in each state, 
which we refer to as unit counts. A linear counting query in this 
example can be any linear combination of the unit counts. For in- 
stance, let xny, xnj, xca, %wa be the patient counts in states 
NY, NJ, CA, and WA respectively; one possible linear counting 
query is xny + xm.j + xca + xwa, which computes the total 
number of HIV+ patients in the four states listed in our example. 
Another example linear counting query is xjvy/19 + xnj/8 + 
xca I '37, which calculates the weighted average of patient counts 
in states NY, NJ and CA, with weights set according to their re- 
spective population sizes. In general, we are given a database with 
n unit counts, and a batch QS of m linear counting queries. The 
goal is to answer all queries in QS under e-differential privacy, and 
maximize the expected overall accuracy of the queries. 
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(a) Patient records (b) Statistics on HIV+ patients 

Figure 1: Example medical record database 

Straightforward approaches to answering a batch of linear count- 
ing queries usually lead to sub-optimal result accuracy. One naive 
solution, referred to as noise on queries (NOQ), is to process each 
query independently, e.g., using the Laplace Mechanism 1111 . This 
method fails to exploit the correlations between different queries. 



Consider a batch of three different queries qi — xny + xnj + 
xca + x W a, qi = xny + xnj, qz = x C a + x W a- Clearly, the 
three queries are correlated since q 1 = qi + 53. Thus, an alternative 
strategy for answering these queries is to process only qi and qz, 
and use their sum to answer qj. As will be explained in Section 3, 
the amount of noise added to query results depends upon the sen- 
sitivity of the query set, which is defined as the maximum possible 
total change in query results caused by adding or removing a sin- 
gle record in the original database. In our example, the sensitivity 
of the query set {92, 53} is 1, because adding/removing a patient 
record in Figure QJ affects at most one of qi and qz (i.e., qi if the 
record is associated with state NY or NJ, and qz if the state is CA or 
WA), by exactly 1. On the other hand, the query set {q\ , 52, qz} has 
a sensitivity of 2, since a record in the above 4 states affects both 
qi and one of qi and qz. According to the Laplace mechanism, 
the variance of the added noise to each query is 2A 2 /e 2 , where A 
is the sensitivity of the query set, and e is the user-specified pri- 
vacy budget. Therefore, processing {q\,qi,qz\ directly incurs a 
noise variance of 8/e 2 for each query; on the other hand, executing 
{qi, qz} leads to noise variance of 2/e 2 for each of 52 and qz, and 
their sum gi = 92 + qz has a noise variance of 2 x 2/e 2 = 4/e 2 . 
Clearly, the latter method obtains higher accuracy for all queries. 

Another simple solution, referred to as noise on data (NOD), is 
to process each unit count under differential privacy, and combine 
them to answer the given linear counting queries. Continuing the 
example, this method computes the noisy counts for xny, xnj, 
xca and xwa, and uses their linear combinations to answer qi, 52, 
and qz- This approach overlooks the correlations between different 
unit counts. In our example, x ny and xnj (and similarly, xca and 
xwa) are either both present or both absent in every query, and, 
thus, can be seen as a single entity. Processing them as indepen- 
dent queries incurs unnecessary accuracy costs when re-combining 
them. In the example, NOD adds noise with variance 2/e 2 to each 
unit count, and their combinations to answer qi, q2, and qz have 
noise variance 8/e 2 , 4/e 2 and 4/e 2 respectively. NOD's result util- 
ity is also worse than the above-mentioned strategy of processing 
52 and qz, and adding their results to answer qi . 

In general, the query set QS may exhibit complex correlations 
among different queries and among different unit counts. As a 
consequence, it is non-trivial to obtain the best strategy to answer 
QS under differential privacy. For instance, consider the following 
query set: 

qi = 2x N j + x C A + x W A 

qi = xnj + 2xwa 

qz = xny + 2xca + 2xwa 

NOQ is clearly a poor choice, since it incurs a sensitivity of 5 
(e.g., a record of state WA affects q± by 1, and qi and qz by 2 
each). The sensitivity of NOD remains 1, and it answers q\, qi, 
and 93 with noise variance 12/e 2 , 10/e 2 and 18/e 2 respectively, 
leading to a sum-square error (SSE) of 40/e 2 . The optimal strategy 
in terms of SSE in this case computes the noisy results of xnj and 
xwa, as well as q[ = xny/3 + xca, and q' 2 = 2xny /3. Then, 
it obtains the results for qi, qi, and qz as follows. 

qi = q[ + 2x N j + x W a — q'2/2 

qi — xnj + 2xwa 

qz = 2q[ + 2x W a + q'i/2 

The sensitivity of the above method is also 1, and it answers 
qi, qi, and qz with noise variance 12.5/e 2 , 10/e 2 and 16.5/e 2 re- 
spectively, resulting an SSE of 39/e 2 . Observe that the there is no 
simple pattern in the query set or the optimal strategy. Since there 



is an infinite space of possible strategies, searching for the best one 
is a challenging problem. 

Li et al. 1161 first formalize the above observations (i.e., answer- 
ing a correlated query set with an effective strategy) into the matrix 
mechanism. However, applying the matrix mechanism in practice 
remains hard, because there is currently no effective solution to 
find a good strategy for an arbitrary query set. The only known 
strategy-searching methods described in 0161 are either inefficient 
(which incur prohibitively high computational costs for even mod- 
erately large domains), or ineffective (which rarely obtain strate- 
gies that outperform naive methods NOD/NOQ). Motivated by this, 
we propose the first practical realization of the matrix mechanism, 
called the low-rank mechanism (LRM), based on the theory of low- 
rank matrix approximation. We prove that the accuracy provided 
by LRM is within a constant factor of the theoretical lower bound 
established in 1141 . Extensive experiments demonstrate that LRM 
significantly outperforms existing solutions in terms of result accu- 
racy, sometimes by orders of magnitude. 

The rest of the paper is organized as follows. Section previews 
previous studies on differential privacy. Section |3]provides formal 
definitions for our problem. Section|4]presents the mechanism for- 
mulation of LRM, and analyzes its optimality. Section[5] discusses 
how to solve the optimization problem in LRM. Section[6]verifies 
the superiority of our proposal through an extensive experimental 
study. Finally, Section|7]concludes the paper. 

2. RELATED WORK 

Section |2~T| surveys general purpose mechanisms for enforcing 
differential privacy. Section l2~2l presents our main competitor, the 
matrix mechanism 1161 . 

2.1 Differential Privacy Mechanisms 

Differential privacy was first formally presented in 01 11 . though 
some previous studies have informally used similar models, e.g., 
(5). The Laplace mechanism 1111 is the first generic mechanism 
for enforcing differential privacy, which works when the output do- 
main is a multi-dimensional Euclidean space. McSherry and Tal- 
war 1211 propose the exponential mechanism, which applies to any 
problem with a measurable output space. The generality of the ex- 
ponential mechanism makes it an important tool in the design of 
many other differentially private algorithms, e.g., lioH29l l2Tl . 

Linear query processing is of particular interest in both the the- 
ory and database communities, due to its wide range of applica- 
tions. To minimize the error of linear queries under differential 
privacy requirements, several methods try to build a synopsis of the 
original database, such as Fourier transformations 0241 . wavelets 
1281 and hierarchical trees 1151 . By publishing a noisy synopsis 
under e-differential privacy, these methods are capable of answer- 
ing an arbitrary number of linear queries. However, most of these 
methods obtain good accuracy only when the query selection crite- 
rion is a continuous range; meanwhile, since these methods are not 
workload- aware, their performance for a specific workload tends to 
be sub-optimal. 

The compressive mechanism 1171 reduces the amount of noise 
necessary to satisfy differential privacy, by utilizing the sparsity of 
the dataset under certain transformations. The main idea is to use 
a technique called compressive sensing to compress a sparse repre- 
sentation of the data into a compact synopsis, and inject noise into 
the much smaller synopsis instead of the original data. After that, 
the method reconstructs the original data by applying the decod- 
ing algorithm of compressive sensing to the noisy synopsis. The 
result provides significantly higher utility, while satisfying differ- 
ential privacy requirements. 



Several theoretical studies have derived lower bounds for the 
noise level for processing linear queries under differential privacy. 
Notably, Dinur and Nissim [9 1 prove that any perturbation mecha- 
nism with maximal noise of scale 0(n) cannot possibly preserve 
personal privacy, if the adversary is allowed to ask all possible lin- 
ear queries, and has exponential computation capacity. By reducing 
the computation capacity of the adversary to polynomial-bounded 
Turing machines, they show that an error scale Q(y/n) is necessary 
to protect any individual' privacy. 

More recently, Hardt and Talwar 11141 have significantly tight- 
ened the error lower bound for answering a batch of linear queries 
under differential privacy. Given a batch of m linear queries, they 
prove that any e-differential privacy mechanism leads to squared 
error of at least Q,(e~ 2 m 3 Vol(W)), where VoliW) is the volume 
of the convex body obtained by transforming the £i-unit ball into 
m-dimensional space using the linear transformations in the work- 
load W. They also propose a mechanism for differential privacy 
whose error level almost reaches this lower bound. However, their 
mechanism relies on uniform sampling in a high-dimensional con- 
vex body, which, although it theoretically takes polynomial time, 
is too expensive to be of practical use. This paper extends their 
analysis to low-rank workload matrices. 

Besides linear queries, differential privacy is also applicable to 
more complex queries in various research areas, due to its strong 
privacy guarantee. In the field of data mining, Friedman and Schus- 
ter 1131 propose the first algorithm for building a decision tree un- 
der differential privacy. Mohammed et al. 11221 study the same 
problem, and propose an improved solution based on a general- 
ization strategy coupled with the exponential mechanism. Ding et 
al. (8) investigate the problem of differentially private data cube 
publication. They present a randomized materialized view selec- 
tion algorithm, which reduces the overall error, and preserves data 
consistency. 

In the database literature, a plethora of methods have been pro- 
posed to optimize the accuracy of differentially private query pro- 
cessing. Cormode et al. (6) investigate the problem of multi- 
dimensional indexing under differential privacy, with the novel idea 
of assigning different amounts of privacy budget to different levels 
of the index. Xu et al. 1291 optimize the procedure of building a 
differentially private histogram, with an interesting combination of 
a dynamic programming algorithm for optimal histogram compu- 
tation and the exponential mechanism. 

Differential privacy is also becoming a hot topic in the machine 
learning community, especially for learning tasks involving sen- 
sitive information, e.g., medical records. In Chaudhuri et al. 
propose a generic differentially private learning algorithm, which 
requires strong convexity of the objective function. Rubinstein et 
al. 11251 study the problem of SVM learning on sensitive data, and 
propose an algorithm to perturb the kernel matrix with performance 
guarantees, when the loss function satisfies the i-Lipschitz continu- 
ity property. General differential privacy techniques have also been 
applied to real systems, such as network trace analysis 1191 and 
private recommender systems 1201 . 

2.2 Matrix Mechanism 

Li et al. 1161 propose the matrix mechanism, which formalizes 
the intuition that a batch of correlated linear queries can be an- 
swered more accurately under differential privacy, by processing 
a different set of queries (called the strategy) and combining their 
results. Specifically, given a workload of linear queries, the ma- 
trix mechanism first constructs a workload matrix W of size m x n, 
where m is the number of queries, and n is the number of unit 
counts. The construction of the workload matrix is elaborated fur- 



ther in Section 3. After that, the mechanism searches for a strategy 
matrix A of size rxn, where r is a positive integer. Intuitively, A 
corresponds to another set of linear queries, such that every query 
in W can be expressed as a linear combination of the queries in A. 
The matrix mechanism then answers the queries in A under differ- 
ential privacy, and subsequently uses their noisy results to answer 
queries in W . 

The main challenge for applying the matrix mechanism to prac- 
tical workloads is to identify an appropriate strategy matrix A, Ref. 
| Id | provides two algorithms for this purpose. The first, based on 
iteratively solving a pair of related semidefinite programs, incurs 
0(m 3 n 3 ) computational overhead, which is prohibitively expen- 
sive even for moderately large values of m and n. The second 
solution computes an £2 approximation of the optimal strategy 
matrix A. This method, though faster than the first one, still in- 
curs high costs as we show in the experiments. Further, the £2 
approximation of the optimal strategy matrix often has poor qual- 
ity. In fact, throughout our experimental evaluations, we have never 
found a single setting where this method obtains lower overall er- 
ror than the naive solution NOD that injects noise directly into the 
unit counts. Although the matrix mechanism makes a significant 
theoretical contribution, so far its practice use is limited due to the 
lack of an effective implementation. 

3. PRELIMINARIES 

In this paper, we assume there are n records in a database D, i.e., 
D — {xi,X2, ■ ■ ■ ,x n }. Each Xi in D is a real number. To facilitate 
matrix manipulations, in the rest of the paper we use a vector of size 
n X 1 to denote the database, i.e. {xi, X2, ■ ■ ■ , x n } T . In FigureQ] 
for example, each record contains the number of HIV+ patients in 
a state of the USA. A query set Q of cardinality m is a mapping 
from the database domain to real numbers, i.e., Q : D i-> E m . 

3.1 Differential Privacy 

A query processing mechanism Al is a randomized mapping 
from D x Q to R m . Given an arbitrary query set Q £ Q and a 
database D G D, the mechanism Al returns a distribution on the 
query output domain R m . Two databases D\ and D2 are neigh- 
bor databases iff they differ on exactly one record, i.e., Di — 
{xi, x 2 , ■ ■ ■ , Xi, . . . ,x„} and D 2 = {xi, x 2 , ■■■ ,x' iy ... , x n }. A 
randomized mechanism M satisfies e-differential privacy if for ev- 
ery pair of neighbor databases Di and D2, we have 

VQVi? : Pr(M(Q, Di) = R) < e e Pr(M(Q, D 2 ) = R) (1) 

The above inequality implies that the mechanism M always re- 
turns similar results on neighbor databases. This limits the adver- 
sary's confidence in inferring any record from the output of M, 
even when he or she knows all remaining records in the database. 

In 1111 . Dwork et al. presented a general protocol to implement 
e-differential privacy, utilizing the concept of sensitivity. Given a 
query set Q £ Q, the sensitivity A is the maximal £1 distance 
between the exact query results on any neighbor databases D\ and 
L> 2 , i.e. 

A = max ||Q(Di),Q(D 2 )||i (2) 

We emphasize that A only depends on the data domain B and 
the query set Q, not the actual data. Therefore, we simply assume 
such a constant A is public knowledge to everyone, including the 
adversary. The Laplace Mechanism 11 11 . Ml, outputs a random- 
ized result R on database D, following a Laplace distribution with 
mean Q(D) and magnitude — , i.e., 

Pr(M L (Q,D) = R) oc exp \\R - Q(D)||i) (3) 



This is equivalent to adding m-dimensional independent Laplace 
noise, as Q(D) + Lap(A) m 5 in which Lap (—) is a random 
variable following a zero-mean Laplace distribution with scale — . 
Based on the definition of the Laplace mechanism, the expected 
squared error of the randomized query answer is 2 "\ , since the 
variance of Lap(s) is 2s 2 for any scale s. Note that the amount of 
error only depends on the sensitivity of the queries, regardless of 
the records in database D. 

3.2 Batch Linear Queries 

As mentioned in the introduction, we focus on non-interactive 
linear queries in this paper. A linear query q(D) is in the form of 
a linear function over the records in the database. Given a weight 
vector {wi,W2, . . . ,w n } T of size n, the linear query returns the 
dot product between the weight vector and database vector, i.e., 

q(D) = wixi + W2X2 + ■ ■ ■ + w n x n 

We assume a batch of m linear queries, Q = {qi, q2, . . . , q m }, 
is submitted to the database at the same time. The query set Q 
is thus represented by a workload matrix W with m rows and n 
columns. Each entry Wij in W is the j-th coefficient for query 
qi on record Xj. Using the vector representation of the database, 
i.e. D = (xi, X2, ■ ■ ■ , x n ) T , the query batch Q can be exactly 
answered by calculating: 



Q(D) = WD = [J2 Wi 3 x 3 , . . . , J2 WmjXj 




Based on the Laplace mechanism, two baseline solutions to en- 
force e-differential privacy on a query batch with workload W are 
as follows. 

Noise on data: This solution, denoted as Mb, adds noise to the 
original data. Given database D, Mo generates a noisy database 
D' using the Laplace mechanism, i.e., D' = D + Lap (— )™. The 
query batch Q is then answered by replacing D with D' . The whole 
mechanism can be written in the form of manipulation on random 
variables, as follows. 

Md{Q,D) = WD' = W (o + Lap (7)) ( 4 ) 

Based on the linearity of expectation, it is straightforward to cal- 
culate the expected squared error on the output, 2A_ Wfj, 
which is proportional to the squared sum of the entries in W . 
Noise on results: This baseline solution, denoted as Mr, adds 
noise to the query results instead of the original data. Since the 
queries are linear queries, the sensitivity of the query set is A' = 
max,, £\ I Wi 3 | A, i.e., the highest column absolute sum [16]. Thus, 
Mr outputs the following random results. 

M R {Q,D) = WD + Lap(^j (5) 

Similarly, the expected squared error of the mechanism on query 
Q is 2mA' 2 e~ 2 = 2mmaxj J^. W^ A 2 e~ 2 . By comparing their 
expected squared errors, we derive that Mr outperforms Md by 
expectation, iff mmaxj ^\ Wfj < £^ £\ Wfj. When m > n, 
this inequality can never hold, implying that Mr is more effective 
only when m is smaller than n. 

3.3 Low Rank Matrices 

For any square matrix A — {Aij} of size n x n, the trace of 
the matrix is the sum of the diagonal entries in A, i.e., ti(A) = 
^ i An. Given a matrix W = {Wij} of size m x n, the Frobenius 



norm of W is the square root of the squared sum over all entries, 
i.e., ||W||f = ^Ylij(Wij) 2 ■ Following common notation, W T 
denotes the transposed matrix of W. 

Singular value decomposition (SVD) applies to any real-valued 
matrix W. Specifically, the result of SVD on W includes three 
matrices, U, E and V, such that W - UHV. Here, U, E, and 

V are of size m x s, s x s, and s x n respectively, where m and 
n are the number of rows and columns in W respectively, and s 
is a positive integer no larger than min{m, n}. Moreover, U and 

V are row-wise and column-wise orthogonal matrices respectively. 
E is a diagonal matrix, which contains non-negative real numbers 
on the diagonal and zeros in all the other entries. These diagonal 
entries, {Ai, A2, • • • , A s }, are called eigenvalues of the matrix W. 
The number of non-negative eigenvalues is called the rank of W, 
denoted as rank(W). 

When the rows and columns in the matrix W are correlated, the 
rank of the matrix W can be smaller than m and n. In such cases, 
we say that W is a low rank matrix. For example, when a group 
of records tend to appear together in a query, the workload matrix 
W often exhibits strong column correlations. Similarly, when one 
query can be expressed as the linear combination of other queries, 
W has strong row correlations. Both cases can be exploited to re- 
duce the noise level necessary to satisfy differential privacy, as we 
showed in Section 1. Next we present the Low Rank Mechanism, 
a general solution to enforce differential privacy on a batch of lin- 
ear queries, which utilizes the low rank property of the workload 
matrix to reduce noise. 

4. WORKLOAD DECOMPOSITION 

In this section, we propose a general workload matrix decom- 
position technique that minimizes the error for a batch of linear 
queries. Recall that the example in Figure [Tj shows that instead 
of adding noise to the original data or query results (i.e., methods 
NOD and NOR), it is sometimes possible to construct another lin- 
ear basis that leads to higher overall query accuracy. To build such 
a basis, we partition the workload matrix W into the product of 
two components, B — {Bij} of size m x r and L = {Ljk.} of 
size r x n, such that W = BL. Note that r can be larger than the 
rank of the workload matrix W. Given the matrix decomposition, 
we design general mechanism for adding noise to LD (D is the 
dataset), and analyze the expected squared error. We first formally 
define the concepts of query scale and query sensitivity, for a given 
decomposition W = BL. 

Definition 1. Query Scale 

Given a workload decomposition W = BL, the scale of the de- 
composition, denoted by $(£>, L), is the squared sum of the entries 
inB,i.e.,$(B,L) = j: t j B%. 

DEFINITION 2. Query Sensitivity 

Given a workload decomposition W = BL, the sensitivity of the 
decomposition, denoted by A(B, L), is the maximal absolute sum 
of any column in L, i.e., A(B, L) = maxj |Ly |. 

Since W = BL, the linear query batch can be answered by 
calculating Q(D) = WD = BLD. Unlike solutions NOD and 
NOR, we inject noise into the intermediate result LD to enforce 
differential privacy. Since LD is another group of linear queries, 
we can apply NOR on Q'(D) = LD with Eq. {3). The sensitivity 
of the new linear query batch is A(B,L), which leads to the fol- 
lowing differential privacy mechanism Mp(Q, D) with respect to 



the workload decomposition W = BL. 

M P {Q, D) = B [LD + Lap ( A(g ' L) ] 



The error analysis of Mp(Q, D) is complicated as its adds noise 
at an intermediate step. The following lemma shows that the error 
is linear in the query scale, and quadratic in the query sensitivity. 

LEMMA 1. The expected squared error of Mp(Q, D) with re- 
spect to the decomposition W = BLis2$(B,L) (A(B,L)) 2 /e 2 . 

Accordingly, we reduce the problem to finding the optimal work- 
load decomposition W = BL that minimizes <E»(i3, L) (A(B, L)) 2 . 
However, this optimization problem is difficult to solve, since the 
objective function is the product of $(£?, L) and A(B,L), and 
A(B, L) may not be derivable. To address this problem, we first 
prove an interesting property of the workload decomposition, which 
implies that the exact query sensitivity is actually not important. 

LEMMA 2. Given a workload decomposition W = BL and a 
positive constant a, we can always construct another decomposi- 
tion W — B' L' such that B' = aB and L' = oT x L, satisfying 

*(B,L) (A(B, L)) 2 = HB',L') {A(B',L')) 2 

According to the above lemma, the balance between scale and 
sensitivity is not important, as we can always build another equiv- 
alent workload decomposition with arbitrary sensitivity. This mo- 
tivates us to formulate a new optimization program, which focuses 
on minimizing the query scale while fixing the query sensitivity. 
The following theorem formalizes this claim. 

THEOREM 1. Given the workload W, W = BL is the opti- 
mal workload decomposition to minimize expected squared error if 
( B, L) is the optimal solution to the following program: 



Minimize: tr(B B) 
s.t. W — BL 



(7) 



< 1 



In the optimization problem above, we are allowed to specify 
the number of columns in the matrix B, i.e. the rank r of the 
matrix product BL. This enables us to generate matrices of sig- 
nificantly lower rank than the strategy matrix proposed in 0161 . We 
thus use Low Rank Mechanism to denote the general query process- 
ing scheme in Eq. ((6), using the optimal decomposition solution to 
Formula (0. 

4.1 Optimality Analysis 

In this subsection, we analyze the optimality of our optimization 
formulation. Specifically, we show that the utility of our proposed 
mechanism almost reaches the known utility lower bound for linear 
queries under differential privacy 0141 . 

LEMMA 3. Given a workload matrix W of rank r with eigen- 
values {Ai, .... A r }, the expected squared error of Mp(Q, D) 
w.r.t. the optimal decomposition W = B* L* in low rank mech- 
anism is bounded above by AfcJ"/e 2 . 

Using the geometric analysis technique under orthogonal pro- 
jection 1 141 . the following lemma reveals a lower bound on the 
squared error for linear queries. 



LEMMA 4. Given a workload matrix W of rank r with eigen- 
values { A i , . . . , A r }, the expected squared error of any ^-differential 
privacy mechanism is at least 




Assume that all the eigenvalues {Ai, A2, . . . , A,-} of workload 
W are ordered in non-ascending order. We use C = Ai/A r to 
denote the ratio between the largest eigenvalue and the smallest 
non-zero eigenvalue. The following theorem discusses the tight- 
ness of low rank mechanism on error minimization. In particular, it 
proves the optimality of the result decomposition W = B*L* with 
respect to Formula ((7}. 

THEOREM 2. When r > 5, the mechanism M P (Q,D) using 
W = B* L* is an 0(C 2 r)-approximately optimal solution w.r.t. 
the set of all non-interactive ^-differential privacy mechanisms. 

When C is close to 1 , all non-zero eigenvalues are close to each 
other and the mechanism under our decomposition optimization 
program outputs results that well approximate the lower bound. 
This result answers one of the questions in 1141 . in which the au- 
thors discussed possible orthogonal projections but did not provide 
a concrete algorithm to identify the optimal projection. Our formu- 
lation can be regarded as an implementation of orthogonal projec- 
tion with almost constant approximation. Therefore, our result fills 
the gap between theory and practice. 

4.2 Relaxation on Decomposition 

Theorem [2] shows that our decomposition leads to results with 
a tight bound. However, when there are very small eigenvalues in 
the workload matrix W, the bound in the theorem becomes loose. 
On the other hand, these small eigenvalues contribute little to the 
workload matrix W. This observation motivates us to design a new 
optimization formulation, in which BL does not necessarily match 
W, but within a small error tolerance. This enables the formulation 
to find a more compact decomposition, such that the r used in B 
and L can be smaller than the actual rank of W. 

To do this, we introduce a new parameter 7 to bound the differ- 
ence between W and BL in terms of the Frobenius norm. This 
leads to a new optimization problem: 



Minimize: ti(B T B) 
s.t. \\W~BL\\ F < 7 



(8) 



After finding the optimal (B, L) for the problem in Formula[8] 
the mechanism Mp{Q,D) outputs query results using Eq. ((6}. 
The error of this new mechanism is also bounded, as stated in the 
following theorem. 

THEOREM 3. The expected squared error of Mp(Q,D) using 
the decomposition (B, L) satisfying Eq. @ is at most 

2tr{B T B)/e 2 + 7^a: 2 

i 

While Theorem [3] implies the possibility of estimating the op- 
timal 7, it is not practical to implement it directly, because this 
estimation depends on the data, i.e., 'Y^ i x 2 . In our experiments, 
we test different values of 7, and report their relative performance, 
regardless of the data distribution. 



Algorithm 1 Workload Matrix Decomposition 



Initialize ty 



(0) 



G 



\/3 



(0) 



l,jfc = 1 



while not converged do 

//Approximately solve the subproblem 
while not converged do 

B {k) <- update B using Eq. © 

L (fc) «— run Algo. |2]to update L w.r.t. Formula i fTOl 
Computer = || W - B {k) L (h) \\ F 
if r is sufficiently small or ft is sufficiently large then 

return and L (fc) 

if k is divisible by 10 then 

g(h+l) _ 2/ g(fe) 

„.(*+!) = „.(*) + ^(fe+l) ^ _ S (*0 
fc = k + 1 



5. DECOMPOSITION ALGORITHM 

The previous section formulates the workload matrix decompo- 
sition problem as an optimization program, which is rather compli- 
cated and non-trivial to solve. This section describes an effective 
and efficient solution for this program, based on the inexact Aug- 
mented Lagrangian Multiplier (ALM) method Bl[T8l. 

The main challenge in solving the optimization program of For- 
mula l[8]l is the non-smooth C\ regularized term. The projected 
gradient method j 10] is considered one of the most efficient gen- 
eral algorithms to solve these problems. Following the strategy 
used in QO, we treat the C\ regularized term separately and ap- 
proximately minimize a sequence of Lagrangian subproblems. Our 
inexact Augmented Lagrangian method for workload matrix de- 
composition problem is summarized in Algorithm [TJ 

In order to handle the linear constraints || W — BL^f < 7 — > 0, 
in which W G R mx ™, B G R mxr and L G R rx ™, the inexact 
Augmented Lagrangian method introduces a positive penalty item 
P G R and the Lagrange multiplier ty G R mxn . The update on ft 
and 7T follows the standard strategy used in |5. 18]. Given fixed /3 
and 7r in each iteration, the algorithm aims to find a pair of new B 
and L to minimize the following subproblem: 

J(B,L,fi,n) = ± tr (B T B) + {n,W - BL) + ^\\W - BL\\% 



s.t. Vj^|L i3 | < 1 



This is a Bi-Convex optimization problem, which can be solved 
by block gradient descent via alternately optimizing B and L. Based 
on the formulation above, optimizing B is straightforward. Since 
the gradient with respect to B can be computed as: 

^-=B- tvL t + fiBLL T - PWL T 
oB 

based on the fact that J(-) is convex with respect to B, we can set 
^ = 0, and obtain a closed form solution to update B: 



B = [PWL T + 7vL T ) (/3LL T + I 



(9) 



The second step is to optimize L, which is equivalent to solving 
the following quadratic programming problem: 



G(L) = |tr (l t B t BL) - tr ((J3W + n f BL 



S.t. VjX|L y | <1 



(10) 



Algorithm 2 Nesterov's Projection Gradient Method 

1: input: §£,£«» 



2: x — r ' n ' 10 Lipschitz parameter: oj' ' = 1 
3: Initializations: = L (0) , = 0, 5 (0) = l,t = 1 
4: while not converged do 



for j = to ... do 

u = 2V*- 1 \ U = S - iVs 

Project U to the feasible set to obtain l/ 4 ' (i.e. solve For- 
mula dl lb ) 

if ||5-L (t) || F < xthen 
return; 

Define function: J U)S (U) = Q(S) + ($,U - S) + 



if£?(L (t) 

w« = 

Set 6& = 
/ = t + 1 
return L (t) 



F 

< Ju,s{U) then 
= o;; £ (t+1) = £ ( t) ; break; 
i+Vi+4(a( f - 1 )) 2 



In order to minimize Eq. d 1 Ot under constraints, we employ Nes- 
terov's first order optimal method 0231 to accelerate the gradient 
decent. Nesterov's method has a much faster convergence rate than 
traditional methods such as the subgradient method or the naive 
projected gradient descent. In particular, the gradient of Q (L) with 
respect to L is 



dg 

dL 



/3B T BL 



PB T W 



B t tv 



L is updated by gradient descent while ensuring that the L\ reg- 
ularized constraint on L is satisfied. This can be done by solving 
the following optimization problem: 



mm 

L 



\L-L 



F, S.t, 



< 1, 



(11) 



in which IA*' denotes the last feasible solution after exactly k itera- 
tions. Since Formula dl It can be decoupled into r independent £1 
regularized sub-problems, it can be solved efficiently by L\ pro- 
jection methods 1101 . The complete algorithm for the projection 
method is summarized in Algorithmic 

Convergence Analysis: In each iteration, the algorithm solves a 
sequence of Lagrangian subproblems by optimizing B (stepO and 
L (step [6} altematingly. The algorithm stops when a sufficiently 
small 7 is achieved or the penalty parameter ft is sufficiently large. 
It suffices to guarantee that L converges to the optimal solution 
1181 . Although the objective function is non-smooth, the algorithm 
possesses excellent convergence properties. To be precise, we for- 
mally establish the following convergence statement. 

THEOREM 4. If '(B (fe) ,L (fe) ) is the temporary solution after the 
k-th iteration and (B* , L*) is the optimal solution to Formula 0, 
we have 



tr{B (k] B (k] )-tr{B*B*)\ < O 



0k 



Since /r fe ' doubles after every 10 iterations, the algorithm con- 
verges rapidly. This proves the fast convergence property of our 
algorithm. 

Complexity Analysis: The total number of variables in B and L 
is (m+n)r. Each update on B inEq. ((9) takes 0(r 2 m) time, while 
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Figure 2: Effect of varying relaxation parameter 7 with the Search Logs dataset for LRM 
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Figure 3: Effect of varying r with Search Logs dataset for LRM 



each update on L takes 0(r 2 n) time. If Algorithm^] converges to 
a local minimum with ATj n inner iterations (at line 4 in Algorithm 
Q3 and N out outer iterations (at line 2 in Algorithm [T}, the total 
complexity of Algorithm[T]is 0(N in x Nout x (r 2 m + r 2 n)). 

6. EXPERIMENTS 

This section demonstrates the effectiveness of the proposed Low- 
Rank Mechanism (LRM), and compares it against four state-of-the- 
art methods: the approximate Matrix Mechanism (AMM) that op- 
timizes the £2 approximation 1161 . the Laplace Mechanism (LM) 
fTTl . the Wavelet Mechanism (WM) (28) and the Hierarchical Mech- 
anism (HM) |15| . The details of our AMM implementation are 
available in Appendix |B1 All methods were implemented and tested 
in Matlab on a desktop PC with Intel quad-core 2.50 GHz CPU 
and 4GBytes RAM. In all experiments, every algorithm is exe- 
cuted 20 times and the average performance is reported. We em- 
ploy three popular real datasets used in 1151 1291 : Search Log, Net 
Trace and Social Network. Search Log includes search keyword 
statistics collected from Google Trends and American Online be- 
tween 2004 and 2010. Social Network gives the number of users 
in a social network site with specific degrees in the social graph. 
Net Trace is a statistical database containing the number of TCP 
packets related to particular IP addresses, which is collected from a 
university intranet. Search Logs, Net Trace and Social Network 
contain 2 16 = 65,536, 2 15 = 32,768 and 11,342 entries re- 
spectively. The reader is referred to 0151 for more details of these 
datasets. We published our Matlab implementations of all algo- 
rithms used in the experiments, as well as sample datasets, online 
at |http : / /yuanganzhao . weebly . com/| 

To evaluate the impact of data domain cardinality on real datasets, 
we transform the original counts into a vector of fixed size n (do- 
main size), by merging consecutive counts in order. Given the num- 
ber m of linear queries in the batch, we generate three different 
types of workloads, namely WDiscrete, WRange and WRelated. In 



7 


0.0001,0.001,0.01,0.1,1,10 


r 


{0.8, 1.0, 1.2, 1.4, 1.7, 2.1, 2.5, 3.0, 3.6} x rank(W) 


n 


128, 256, 512, 1024, 2048, 4096, 8192 


m 


64,128,256,512,1024 


s 


{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0} X min(m, n) 



Table 1: Parameters used in the experiments 



WDiscrete, for each weight Wij of query qi in the batch, we ran- 
domly select Wij — 1 with probability 0.02 and set Wij = — 1 
otherwise. In WRange, a batch of range queries on the domain 
are generated, by randomly picking up the starting location a and 
ending location b following a uniform distribution on the domain. 
Given the interval (a, b), we set Wij of query qi in the batch to 1 
for every a < j < b and all other weights to 0. Finally, for WRe- 
lated, we generate s (discussed later) independent base queries A 
of size s x n, by randomly assigning weights to the queries under 
a standard (0, 1) -normal distribution. Another group of correlation 
matrix C of size m x s are generated similarly. The final workload 
W of size m x n is the product of C and A. 

We test the impact of five parameters in our experiments: 7, r, 
n, m and s. 7 is the relaxation factor defined in Formula ((8). r is 
the number of columns in B (and also the number of rows in L). 
n is the size of the domain and m is the number of queries in the 
batch. Finally, s is the number of rows of queries in the base A, 
which is only used in the generation of WRelated. The range of all 
these five parameters is summarized in Table Q] Unless otherwise 
specified, the default parameters in bold are used. Moreover, we 
test three different privacy budgets, e = 1, 0.1 and 0.01. Note that 
the squared error incurred by all the methods is quadratic in 1 /e. 

In the experiments, we measure Average Squared Error and Com- 
putation Time of the methods. Specifically, the Average Squared 
Error is the average squared C2 distance between the exact query 
answers and the noisy answers. In the following, we first examine 
the impact of 7 and r, which are only used in the LRM method. 
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Figure 5: Effect of domain size n on workload WRange with e = 0.1 



The results provide important insights on how to tune these two 
parameters to maximize the utility of the LRM method. 

6.1 Impact of 7 and r on LRM 

In LRM, 7 is an important parameter controlling the relaxation 
on the approximation of BL to W. In our first set of experiments, 
we investigate the impact of 7 on the accuracy and the efficiency of 
LRM. Figure|2]reports the performance of LRM under all three dif- 
ferent workloads, WDiscrete, WRange and WRelated on the Search 
Log dataset with varying values for 7. The results in the figure show 
that the errors of LRM on all three workloads are not sensitive to 
7 in the range from 10~ 4 to 10. On the other hand, LRM executes 
much faster with larger 7. This suggests that a larger value for 7 
is preferred in practice, to achieve high efficiency without losing 
much on result accuracy. Moreover, we also test with three differ- 
ent values of the privacy budget e. Since the decomposition method 
does not rely on e, the shapes of the result curves with different e 
values are nearly identical, albeit at different scales. The average 
error is quadratic in the privacy budget i, as expected. 

In LRM, r is another important parameter that determines the 
rank of the matrix BL that approximates the workload W. r af- 
fects both the approximation accuracy and the optimization speed. 
When r is too small, e.g., when r < rank(W), our optimization 
formulation may fail to find a good approximation, leading to sub- 
optimal accuracy for the query batch. On the other hand, an overly 
large r leads to poor efficiency, as the search space expands dramat- 
ically. We thus test LRM with varying r, by controlling the ratio 
of r to the actual rank rank(W), on the Search Log dataset. We 
record the average squared error under all the workloads and report 
it in Figure [3] 

There are several important observations in Figure |3] First, when 
r < rank(W), the accuracy of LRM is far worse (up to two orders 
of magnitude) than that in other settings. Second, the performance 
of LRM is rather stable when r becomes larger than 1.2 -rank(W). 



This is because the optimization formulation has enough freedom 
to find the optimal decomposition when r is larger than rank(W). 
Finally, the amount of computation spent on workload decomposi- 
tion increases exponentially with r. Thus, to balance the efficiency 
and effectiveness of LRM, a good value for r is between rank(W) 
and 1.2 • rank(W). We use the latter as the default value in the 
subsequent experiments. 

6.2 Impact of Varying Domain Size n 

We now evaluate the performance of all mechanisms with vary- 
ing domain size n. As mentioned earlier in this section, the domain 
size is controlled by merging consecutive counts in the original do- 
main. While different workloads and datasets are used, we only 
test settings with e = 0.1 because e does not have much impact on 
the relative performance of different mechanisms. In Figures [4] [5] 
and[6] we report the result error rates of all these mechanisms. 

In all experiments, the approximate Matrix Mechanism (AMM) 
is much worse than the other mechanisms, sometimes by an order 
of magnitude. This is mainly because the £2 approximation used 
by AMM does not lead to a good optimization of the actual objec- 
tive function formulated using the error measure in Ci. Because of 
its poor performance, we exclude AMM in the rest of the experi- 
ments. 

On the WDiscrete workload, the Laplace Mechanism (LM) out- 
performs all other mechanisms when the domain size is relatively 
small. This is in part due to the fact that the Wavelet Mechanism 
(WM) and the Hierarchical Mechanism (HM) are mainly designed 
to optimize range queries. While all other mechanisms incur linear 
error in terms of the domain size n, LRM's error stops increasing 
when the domain size is larger than 512. This is because LRM's 
error relies on the rank of the workload matrix W, and rank(W) 
is no larger than min(m, n) no matter how large n is. This ex- 
plains the excellent performance of LRM on larger domains. On 
the WRange workload, the errors of WM and HM are smaller than 
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Figure 7: Effect of number of queries 

LM when the domain size is no smaller than 512, in which case 
their strategies work better. LRM's performance is still signifi- 
cantly better than any of them, since LRM fully utilizes the cor- 
relations between these range queries on large domains. Finally, 
on the WRelated workload, LRM achieves the best performance on 
all test cases. The performance gap between LRM and other meth- 
ods is over two orders of magnitude, when the domain size reaches 
8192. Since WRelated naturally leads to a low rank workload ma- 
trix W, this result verifies LRM's vast benefit from exploiting the 
low-rank property of the workload. 

6.3 Impact of Varying Query Size m 

In this subsection, we test the impact of the query set cardinality 
m on the performance of the mechanisms. We mainly focus on 
settings when the number of queries m is no larger than the domain 
size n, i.e. m < n. Due to space limitations, we only present the 
results on WRange and WRelated workloads in Figures [7]and [8] 

The results lead to several interesting observations. On WRange 
workload (Figure|7}, LRM outperforms the other mechanisms, when 
the number of queries m is significantly smaller than n. With grow- 
ing m, the performance of all mechanisms on WRange tends to 
converge. When m = 1024, WM achieves the best performance 
among all mechanisms, since it is optimized for range queries. The 
degeneration in performance of LRM is due to the lack of low rank 
property when the batch contains too many random range queries. 
On WRelated workload, LRM is dramatically better than the other 
methods, for any query set cardinality m. Regardless of the value 
of m, the rank of the WRelated workload W remains low, which is 
solely determined by the parameter s used in the workload genera- 
tion procedure. These results further confirm that the squared error 
generated by LRM scales linearly with the rank of the workload. 

6.4 Impact of Varying Query Rank s 

All previous experiments demonstrate LRM's substantial perfor- 
mance advantage when the workload matrix has low rank. In this 



m on workload WRange with e = 0.1 

group of experiments, we manually control the rank of workload 
W to verify the correctness of our claim. Recall that the param- 
eter s determines the size of the matrix C mxs and the size of the 
matrix A axn in the generation of the WRelated workload. When 
C and A contain only independent rows/columns, s is exactly the 
rank of the workload matrix W = CA. In Figure[9] we vary s from 
0.1 min(m, n) to min(m, n). Compared to the other mechanisms, 
LRM maintains an accuracy advantage of over two orders of mag- 
nitude, when the rank of the workload matrix is low. With increas- 
ing rank of W, the accuracy of other mechanisms remain stable, 
while LRM's error grows rapidly. This phenomenon again con- 
firms that the low rank property is the main reason behind LRM's 
advantages with respect to error minimization. 

7. CONCLUSION 

This paper presented the Low Rank Mechanism (LRM), an opti- 
mization framework that minimizes the overall error in the results 
of a batch of linear queries under e-differential privacy. LRM is the 
first practical method for a large number of linear queries, with an 
efficient and effective implementation using well established op- 
timization techniques. Experiments show that LRM significantly 
outperforms other state-of-the-art differentially private query pro- 
cessing mechanisms, often by orders of magnitude. The current 
design of LRM focuses on exploiting the correlations between dif- 
ferent queries. One interesting direction for future work is to fur- 
ther optimize LRM by utilizing also the correlations between data 
values, e.g., as is done in |29ll24|[T7l . 
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APPENDIX 
A. PROOFS 

Lemma [7} 

PROOF. Based on the definition of the mechanism in Eq. {6j, 
the residual of the noisy result with respect to the exact result, i.e. 

Q{D) - M P (Q, D), is B ■ Lap ( A(B e ' L) Y . The expected squared 

error is thus TV B% 2< - A ^,L)f since ^ L) = ^ ^ ^ 

expected error of the mechanism is 2(j>{B, L)(A(B, L)) 2 /e 2 . □ 

Lemma [2} 

PROOF. Based on the definition of sensitivity, we have A(B' , L') 
= max.,- = max* E l \ L v/a\ = a~*A(B, L). 

The last equality holds because a is a positive constant. On the 
other hand, the scales of the decompositions follow a similar rela- 
tionship: 

$(B',L') = J2( B '»f =£a 2 (B«) 2 = a 2 $(B,L) 

ij ij 

Therefore, L')(A{B', L') 2 = $(73, L)(A(B, L)) 2 . Fi- 

nally, since B'L' = BL = W, we reach the conclusion of the 
lemma. □ 

Theorem Q} 

PROOF. Assume that (B* , L* ) is the best matrix decomposition 
for minimizing the expected squared error for Mp(Q, D). In the 
following, we prove that (B* , L*) is optimal, if and only if it also 
minimizes the program in Formula Q. 

{if part): If (B, L) minimizes Formula (0 but (B, L) incurs 
more expected error than (B* ,L*), implying that 

$(B*,L*)(A(B*,L*)) 2 < $(B,L)(A(B,L)) 2 

By applying Lemma [2] we can construct another decomposi- 
tion B' = A(B*,L*)B* and L' = A(B* , L*)^ 1 L* , such that 
$(B',L'){A(B',L')) 2 < $(B,L)(A(B,L)) 2 . On the other hand, 
since A(B' , L') < 1, we have maxj Yi l-^ijl = !• Therefore, we 
can derive the following inequalities. 

HB',L') = <f(B',L')(A(B',L')) 2 
< $(B,L)(A(B,L)f 



Finally, since $(B',L') = tt{B' T B') and L) = ir{B T B), 
it leads to a contradiction if \i{B lT B') < ti(B T B). 

(only if part): If (B*,L*) is not the optimal solution to the pro- 
gram in Formula ((Tj, the optimal solution (B, L) must incur less 
expected error, using a similar strategy. This completes the proof 
of the theorem. □ 

Lemma\3\ 

PROOF. To prove the lemma, we aim to artificially construct a 
workload decomposition W = BL satisfying the constraints of the 
optimization formulation. If the error of this artificial decomposi- 
tion is no larger than the upper bound, the exact optimal solution 
must render results with less error. 

Recall that W has a unique SVD decomposition W = VEV 
such that E is a diagonal matrix of size r x r. We thus build a 
decomposition B = y/rUH and L = -j^V, in which r is the 
rank of the matrix W. First, we will show such (B, L) satisfies the 
constraints in Formula ((JJ. It is straightforward to show it satisfies 
the first constraint: BL = ^FUE^V = UHV = W. 

Regarding the second constraint, since V only contains orthogo- 
nal vectors, every column j must have \\V-.j ||2 = IMI2 = 1- By the 
norm triangle inequality, ||«||2 < < ^/r||u||2, and we obtain 
-j= Yi l^ij'l ^ !• Therefore, such (B, L) must be a valid solution 
to the program. 

The expected squared error of the artificial decomposition W — 
BL is at most 

lY(B T B)/e 2 = tr((^S) T ( % A : L/E))/e 2 
= tr(E T t/ T f/E))r/e 2 

- EaI.A 2 

fc=i 

This proves that Yjk=i ^fc r / e2 ls an u PP er bound for the noise 
of our decomposition-based scheme. □ 

Lemma\4i 

PROOF. In Corollary 3.4 in ITU, Hardt and Talwar proved that 
any e-differential privacy mechanism incurs expected squared error 
no less tharQfi(r 3 ( Vol(PWB?)) 2/r /e 2 ). 

In the formula above, B" is the £i-unit ball. Vol(PWBi) is 
the volume of the unit ball after the linear transformation under 
PW, in which P is any orthogonal linear transformation matrix 
from R m i-> R r . To prove the lemma, we construct an orthog- 
onal transformation P using the SVD decomposition over W = 
ITEV. By simply letting P = U T , since U T U and VV T are iden- 
tity matrices, we have Vol(PWB?) = Vol(PUVV T Y,VB?) = 
Vol{V{V T Y,V)B , {) = Vol(VB?)H r k=1 X k . The last equality 
holds due to Lemma 7.5 in 1 141 . Consider the the convex body 
VB". It is an r-dimensional unit ball after the orthogonal trans- 
formation under V. Note that Vol(Bi) can be computed using the 
well known T function, as in 1251 , 2 r T ^2^ = 21. Therefore, 

1 ■ r(l + r) r! ' 

the lower bound can be computed as: ^((7,- nl=i Afe) 2//r r 3 /e 2 ). 
This reaches the conclusion of the lemma. □ 

Theorem [2} 

PROOF. To prove the theorem, we investigate the ratio of the 
upper bound to the lower bound. 



1 1141 used absolute error in the paper, which we change to squared 
error here. 



(^n r fc=1 A,) 2/r r3/e 
rA 2 C 



" (^) 2/r A^ (2L) 2/l V " (4) r 

The last inequality holds due to the fact that r\ < (§) r when 
r > 5. Note that all the inequalities above are tight, and the equal- 
ities hold when C = 1, i.e. Ai = A2 = • • ■ = A r , Thus, we 
prove that the approximation factor of our decomposition scheme 
is 0{C 2 r). □ 

Theorem]^ 

PROOF. When W 7^ BL, the error has two parts. The first part 
is the noises due to the Laplace random variables. Using LemmaQ] 
the incurred error is at most Jr<I>(B, L)(A(B, L)) 2 < ■S F a(B T B). 

The second part of the error is the structural error on the results. 
The expected squared error is measured as 

((W - BL)D) T {W - BL)D 

n 

< \\W - BL\\%D T D = \\W - BL\\%J2 x i 



W-B 



Since tt«* is always bounded, we conclude that 



(fc + l)* r(fc + l)* 



(fc + l)* (fc)* 

7T — 7P 



}_ t J B {k+l) T B {k + 1)\ _ 1 



tr ( B* T B* \ < O I 



2" \~ I 2~V~J-~\ /3« 

This completes the proof of the theorem. □ 

B. IMPLEMENTATION OF THE MATRIX 
MECHANISM 

In 1161 , Li et al. propose the Matrix Mechanism. The core of 
their method is finding a matrix A to minimize the following the 
program. 

min \\Af 2oo tr(W T WA' t A tT ) (12) 

A€R rxn 

Li et al. 1161 present a complicated implementation that may not 
be practical due to its high complexity. We hereby present a sim- 
pler and more efficient solution to their optimization program. Here 
1 1 ^4 1] 2, 00 denotes the maximum £2 norm of column vectors of A, 
therefore ||A||| >00 = max(diag(A T J 4)). Since (A T A)- 1 = (A T A)^ 
(A has full column rank), we let M = A T A, and reformulate For- 
mula j 1 2t as the following semidefinite programming problem: 



The inequality is due to the Cauchy Schwartz inequality. By 
linearity of expectation, the expected squared errors can be simply 
summed up. This leads to the conclusion of the theorem. □ 

Theorem [4} 

PROOF. We use B^ to denote the optimal solution of the La- 
grangian sub-problem in k th iteration. Note the following inequal- 
ity on the sequence of the Lagrangian subproblems: 

J{B (k+1) \L^ +1) \^ k) \^ k) ) 
= mmJ(B,L,?T (k) *,p (k) ) 

< min J(B,L,7T {k) *,p (k) ) 

||W-BL|| F < 7 ,Vj £\ |iy|<l 

min -ti(B T B) = -Vc{B* T B") 

\\W-BH\ F <-y,VjJ2i |i«l<i 2 2 



Based on the above inequality, we derive the following inequal- 



ity: 



itr(B (fc+1)T J B (fc+1) ) 
2 v ; 

? (fc+l)* r(fc + l)* (k)* a(fe) 



J(B^ +1 >',L^ +1> ' , n w ,P {K> ) - (n W ,W - B 



+V) + ^\\W - B (k+1) L {k+1) \\ 2 F 



= J(B 



(fc+l)* r(fc+l)* (fc)* o(fc) 



7T ,/3 W ) 



2/3( fe ) 



(||7r (fc) +P {k) 



(W-B^L^ +1 >)\\ F -\\^ 



Jh) n 2 



(fc+l)* r(fc + l)* (k)* Mk) 



2/3( fe ) 



(Ik 



(fc+l),|2 



IK If) 



< i tr(B *- i3 * ) __2_(| k (^)*||^_| k w*||^ 

The third equality holds because of the Lagrangian multiplier 
update rule: 



mm max 

MG-R" X " 



(diag(M))tr(W T WM _1 ) s.t. M y 



A is given by A = ^™ sfXlvivJ ', where Ai, «i are the ith 
eigenvalue and eigenvector of M, respectively. Calculating the 
second term tr(W T WM _1 ) is relatively straightforward. Since 
it is smooth, its gradient can be computed as — M~ 1 W T WM~ 1 . 
However, calculating the first term max(diag(Af)) is harder since 
it is non-smooth. Fortunately, inspired by J7], we can still use a 
logarithmic and exponential function to approximate this term. 

Approximate the maximum positive number: Since M is pos- 
itive definite, v = diag(M ) > 0. we let [i > and define: 



U(v) = Mlog^ ( exp l-± 



(13) 



We then have max(w) < / M (v) < max(n) + ^ilogn. If we set 
/i = jjp^j, this becomes a uniform e- approximation of max(w) 
with a Lipschitz continuous gradient with constant oj = i = _ 
The gradient of the objective function with respect to v can be com- 
puted as: 



df 



exp 



(i^ — max(u) \ 



dVi E" (exp( "'-^ axW )) 



(14) 



To mitigate the problems with large numbers, using the prop- 
erty of the logarithmic and exponential functions, we can rewrite 
Eq.lll3t and Eq. l U4b as: 



f^v) = max(i)) + /ilog I exp 



This formulation allows us to run the non-monotone projected 
gradient descent algorithm [2| and iteratively improves the result. 



